Consistency of Spectral Clustering

نویسندگان

  • MIKHAIL BELKIN
  • OLIVIER BOUSQUET
چکیده

Consistency is a key property of all statistical procedures analyzing randomly sampled data. Surprisingly, despite decades of work, little is known about consistency of most clustering algorithms. In this paper we investigate consistency of the popular family of spectral clustering algorithms, which clusters the data with the help of eigenvectors of graph Laplacian matrices. We develop new methods to establish that, for increasing sample size, those eigenvectors converge to the eigenvectors of certain limit operators. As a result, we can prove that one of the two major classes of spectral clustering (normalized clustering) converges under very general conditions, while the other (unnormalized clustering) is only consistent under strong additional assumptions, which are not always satisfied in real data. We conclude that our analysis provides strong evidence for the superiority of normalized spectral clustering.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Analysis of spectral clustering algorithms for community detection: the general bipartite setting

We consider the analysis of spectral clustering algorithms for community detection under a stochastic block model (SBM). A general spectral clustering algorithm consists of three steps: (1) regularization of an appropriate adjacency or Laplacian matrix (2) a form of spectral truncation and (3) a k-means type algorithm in the reduced spectral domain. By varying each step, one can obtain differen...

متن کامل

On defining affinity graph for spectral clustering through ranking on manifolds

Spectral clustering consists of two distinct stages: (a) construct an affinity graph from the dataset and (b) cluster the data points through finding an optimal partition of the affinity graph. The focus of the paper is the first step. Existing spectral clustering algorithms adopt Gaussian function to define the affinity graph since it is easy to implement. However, Gaussian function is hard to...

متن کامل

Model-free Consistency of Graph Partitioning

In this paper, we exploit the theory of dense graph limits to provide a new framework to study the stability of graph partitioning methods, which we call structural consistency. Both stability under perturbation as well as asymptotic consistency (i.e., convergence with probability 1 as the sample size goes to infinity under a fixed probability model) follow from our notion of structural consist...

متن کامل

A variational approach to the consistency of spectral clustering

This paper establishes the consistency of spectral approaches to data clustering. We consider clustering of point clouds obtained as samples of a ground-truth measure. A graph representing the point cloud is obtained by assigning weights to edges based on the distance between the points they connect. We investigate the spectral convergence of both unnormalized and normalized graph Laplacians to...

متن کامل

Comparison of Combination Methods using Spectral Clustering Ensembles

We address the problem of the combination of multiple data partitions, that we call a clustering ensemble. We use a recent clustering approach, known as Spectral Clustering, and the classical K-Means algorithm to produce the partitions that constitute the clustering ensembles. A comparative evaluation of several combination methods is performed by measuring the consistency between the combined ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004